Contact Tracing: Centralize or Decentralize?
Laurin Weissinger, Yale Law School
Contact tracing is one of the few tools we have to dampen the impact of COVID-19. Considering development and testing timelines, effective and available vaccines or treatments are likely months, maybe years away. Therefore, reducing the spread of the disease will remain central for quite a while. Contact tracing is a proven approach that can help us do so, and a debate has risen between whether to use centralized or decentralized contact tracing applications to slow the spread of the virus. As of writing this piece, both the overwhelming majority of applications and the often-underlying Google/Apple tracing API have not been deployed yet. Thus, rather than speaking to specific applications, this piece will dive into basic architecture differences and the issue of trust in those that run the systems.
Various health authorities from all over the world have long traced potential infection paths for critical illnesses, using a manual methodology. In addition, the use of smart phones and Bluetooth/GPS has essentially become “the obvious” solution to support and supplement such manual infection tracing and data analysis. While manual, expert-led tracing usually results in better outcomes, digital applications can scale more easily and can reach people quickly. For example, an application could instantly notify those at risk and, potentially more importantly, inform those at lower risk. As health authorities are taxed and under a lot of pressure at the moment, some automation might help ease their load.
As noted, key specifications of applications and their back-ends are in flux. These important uncertainties aside, there are complex security and architectural decisions that have always to be made when choosing an architecture for such an application, and crucially, its backbone structure and operations.
Essentially, contact tracing via phones using Bluetooth, GPS, and/or cell towers can take two forms: In one form, the phone keeps track of the peers it has met (or, theoretically, the locations and time stamps) locally on the device, i.e. decentralized on a system level, and checks in regularly for a central list of reported/known positives or at-risk areas, which it then compares to its own data store. The processing of data can also be done centrally. This means that all or most data are sent to one processing cluster and that everything is controlled and analyzed from there. While not the focus of this piece, middle-of-the-road options are available and likely to be used in practice.
Theoretically, the centralized architecture can be more intrusive in terms of privacy than a decentralized architecture but centralization does allow for some functionality that is harder or impossible to implement without central storage and control. In actual practice, the exact system implementation and architecture matters: a decentralized system can unduly invade people’s privacy, while a centralized system can have multiple layers of effective privacy controls. No matter which high-level architectural approach is chosen, a variety of questions are needed to assess the impact on privacy and security:
- What data is collected in the first place, and by which method? (e.g. GPS, Bluetooth, cell towers, a combination thereof)
- Are any personal data being collected? Can they be shared (e.g. when reporting oneself positive)? If they were to be shared, with whom?
- Will data be anonymized, and how?
- How long will data be stored, what is stored, where, and under whose authority? (Centralized or decentralized model, or a hybrid approach)
- How is this storage secured, and how is the transfer protected?
- Is the use of the data limited to public health concerns, and what controls are in place to ensure this? How is public health defined?
- Are documents about the risks, controls, application design, and source code available for review by the public, or at least independent experts?
- How does the system tackle external abuse (e.g. false reports) but also internal abuse (e.g. unauthorized tracking of individuals by insiders)?
- Is the system voluntary or required by law?
- How does the system link with other databases and data sources, and potentially health authorities? [1]
- What happens to the data, and the tracking functionality or API after the application is not needed anymore?
The Centralized Option
A centralized tracing system has one key benefit and key issue: central control and insight. This has various upsides, including the ability to correct wrong, e.g. accidental, reports. The ability to counter abuse from external parties, e.g. dealing with purposefully wrong reports, would be far greater as well. Central data storage and processing would also simplify research, providing a larger dataset to epidemiologists and health authorities. Correlating the application’s data with databases already held by public health agencies would further enhance research opportunities. Better data and analysis could improve our understanding of the virus and transmission, and in turn help experts optimize tracing methods as well as our response more generally.
On the other hand, saving data centrally, in this case sensitive personal health data and information, can be risky in a variety of ways: losing the data center would mean losing all data and tracking, at least until backups can be restored, and such infrastructure might be a key target for attackers.
Furthermore, a central administrator and data controller would likely face considerable legal and (consequentially) financial risk. This could include lawsuits from businesses shut down due to false positive reports, resulting in legal fees and potential damage payments. Likewise, whoever runs the system might be held responsible in case the system fails to register a case of COVID-19 (false negative), with the consequences of infections, hospitalization and care costs, and sometimes even death, Thus, without legal immunity of some sort and in the absence of considerable financial reward, not wanting to be the controller of such a tracing system is a pretty rational choice, and one that Apple and Google seem to have taken. Specifically, Google and Apple opted to develop an Application Programming Interface (API) together.[2] This API allows application programmers to access functionality in Android and iOS that allows for tracking of other devices in proximity. Their approach is decentralized and privacy-respecting, and crucially they are *not* building the actual application themselves but leave this to other parties.
The key concern that most people seem to have about tracing applications is that of abuse by either the centralized system's controller or the relevant government. Governments could expand the use of the data and application, or coerce the private entity running the tracing backbone to share data in order to target minorities and political opponents. Some states have already included predictors like religion into their models without being able to provide public health reasons for their inclusion. Based on what we know about the data economy, it is also not irrational to believe that the Silicon Valley giants, or at least smaller tracking firms, would use the proximity tracking system and API, or at least the “lessons learned”, to further develop their data mining and surveillance infrastructures.
The De-Centralized Option
A decentralized system likewise has one key benefit and key issue: its lack of central control and storage. This approach leads to less control when it comes to false reports, and does not allow for easy and comprehensive data analysis. When trying to trace infection pathways, more data in terms of quality and quantity can be extremely useful: the more we know about the population, their individual characteristics, and their movement patterns, the more accurate our modelling and predictions can become. The only problem is that whoever does the analysis – and whoever can coerce them – would be able to use (and abuse) all that data.
A decentralized approach as supported by Apple and Google and in development in most countries cannot provide these deep insights. While weaker on potential tracing possibilities, the lack of central data storage, control, and processing reduces a variety of risks: without a central data store, there is no (or at least fewer) obvious attack targets and less usable data in case of leaks. This also means that accidents or resets would destroy all proximity data, unless there is some additional backup routine. Crucially, the administrator would be technically unable, or at least considerably less able, to control the system, making them less at risk of legal ramifications. Also, with little data being stored centrally, potential abuse by either state or private parties would be naturally minimized. However, without the ability to administer and manage the system, some attacks, like coordinated mass false positive reports, would be much harder to address.
In short, a more centralized system, if run by a well-resourced, trustworthy and trusted party could provide better service during the pandemic. Such a system would allow better data analysis and tracing, and crucially provide a moderation function to curb abuse and increase data accuracy. Being fully trustworthy, the administrator and owner would make sure, technically and procedurally, that data cannot be and are not abused by their staff or external parties, including the government. The trusted party would also delete information after use, remove the application and all APIs after the system is no longer needed, and so on. Unfortunately, this scenario is pretty unrealistic in the real world. Thus, when it comes to longer term risks, e.g. development of even more intrusive tracking technology for government or private use, the decentralized but likely less effective option appears considerably less problematic overall. Keeping in mind previous actions by some governments, and the advertising and data-mining driven world we inhabit, the use of such systems against public interest is not unrealistic.
Unfortunately, the absence of at least some control or moderation of (digital) systems usually ends pretty badly. There have always been people trying to undermine and destroy communities and applications, and we have little reason to doubt that intentional false reports and similar attempts to disturb the operation of these applications will take place. However, unlike Facebook, Twitter, or Instagram, contact tracing apps will likely become at least quasi-required, deal with PHI, and will not allow for selective presentation of the self[3] as long as they are used as intended. Thus, the potential real-life impact of tracing applications might be far greater. Suspected infections are likely to lead to quarantine and other limitations, while actual infections will cause severe illness and death.
As mentioned previously, both concepts presented here are extreme forms. The problems outlined can be addressed to some extent, while keeping overall architecture in place: for example, through the use of clever cryptography, a decentralized system could require users to confirm their self-report with a one-use secret that they receive with a positive test result. This one-use code would confirm a legitimate test but would not be traceable to a specific individual. Similarly, there are various approaches, including cryptographic but also architectural and procedural, that would at least complicate considerably the identification of individuals even if their data was to be centrally stored and analyzed (e.g. differential privacy, homomorphic encryption).
Last but not least, it is important to keep in mind a variety of inherent limitations: tracing apps, particularly when their use is not legally enforced, can only be a building block in a larger strategy, and likely only a small one. First, any tracing app built upon mobile devices as a platform will be inherently limited. Cell phones might be pretty good in creating movement profiles, but they are not built for exact proximity tracking: they do not have optimized hardware for this task and have to conserve battery power. Furthermore, especially without government mandates, phones might be forgotten at home, turned off, stored in places that limit the reach of their wireless adapters, and so on. Second, contact tracing is much less important than, and can only be gainfully used in combination with, sufficiently widespread, accurate testing regimes, social distancing, quarantining and self-isolation, tracking and tracing by actual experts, and government and private sector policies that allow people to actually reduce their risk when they have to interact with others.[4] In short, while automatically finding potential infection paths helps, having infections properly confirmed by actual tests and traced by experts is better, and denying the virus the ability to spread will always be most impactful.
To conclude, trade-offs will shape any tracing system: a perfect system that “does it all”, does not and cannot exist. However, the key issue in this debate about apps is less about technology and system architectures than it is about trust. We are at a point in time where citizens from all over the world do not trust their governments to not abuse the data collected in some way, and they also do not trust the "digital behemoths" to have the integrity to only use the tracking/tracing infrastructure and the resulting data for dealing with actual health concerns. In consequence, less powerful but “safer” decentralized systems have become the go-to solution. Finally, when it comes to actually containing the virus until treatments are available, contact tracing apps can only be effective as part of a functional, overall strategy. Due to their inherent limitations, they will struggle to be of use on their own, no matter what architecture or approach they follow.
Footnotes:
[1] In various countries, confirmed infections have to be reported to health authorities. While apps could automate such reports, choosing to do so will add further layers of questions.
[2] Their cooperation is welcome, as this increases the number of devices that are compatible.
[3] I.e., they will not allow users to only present parts of their identity and still remain effective.
[4] E.g. work from home, enforced social distancing and no contact strategies where essential services are provided, financial support for individuals and small businesses, online and phone options, etc.